\section{Discussion}
\labsec{discussion}

Instead of finding a better model, in this project I tried to find 
better predictors. Using the affinities instead of the genotypes has 
several advantages:

\begin{itemize}
  \item The model is more interpretable;
  \item The number of predictors decreases;
  \item The predictive power is higher;
\end{itemize}

Upon doing some biochemical considerations, it is possible to find even 
better predictors, although they violate the constraint of using only 
the DNA sequence to predict gene expression.

There are many ways to continue this work. One would be to construct an 
ensemble model that chooses, for each gene, the model that achieved the 
best prediction on a training set.

Another possibility is that of exploiting further the interpretability 
of this model, and use the trees produced by BART to make inferences 
about the regulatory network among genes.

Finally, one of the biggest limitations of these kind of models is that 
they consider each gene as independent of all the others. One possible 
way to use the information hidden among other genes is what I call the 
\enquote{bagging of the genes,} where the prediction for a new gene is 
given by the average of the prediction of a number of other models 
trained on different genes.

If we were able to accurately predict gene expression, the benefit would 
be twofold: first, it would be possible to predict which individuals are 
at risk of developing a disease, and consequently to prevent it; 
secondly, the biological mechanisms through which the illnesses arise 
would be elucidated, potentially leading to the discovery of new 
therapeutic targets. In conclusion, I hope that this project will give a 
contribution, albeit very small, in understanding the relationships 
between genome, expression and diseases.
